Computational methods of EEG signals analysis for Alzheimer’s disease classification

Computational analysis of electroencephalographic (EEG) signals have shown promising results in detecting brain disorders, such as Alzheimer’s disease (AD). AD is a progressive neurological illness that causes neuron cells degeneration, resulting in cognitive impairment. While there is no cure for AD, early diagnosis is critical to improving the quality of life of affected individuals. Here, we apply six computational time-series analysis methods (wavelet coherence, fractal dimension, quadratic entropy, wavelet energy, quantile graphs and visibility graphs) to EEG records from 160 AD patients and 24 healthy controls. Results from raw and wavelet-filtered (alpha, beta, theta and delta bands) EEG signals show that some of the time-series analysis methods tested here, such as wavelet coherence and quantile graphs, can robustly discriminate between AD patients from elderly healthy subjects. They represent a promising non-invasive and low-cost approach to the AD detection in elderly patients.


Methods
Data preprocessing. EEG signals contain electrical activities from the brain as well as artefacts from other sources such as muscles movements and physical interference from the equipment 43,44 . Therefore, the need for preprocessing those signals before analyzing them is crucial. Digital filtering has become the most popular method of rejecting unwanted information in certain EEG frequencies 45 . Discrete Fourier Transform (DFT) has showed itself as a powerful tool in developing digital filters, once it generates the spectrum of a given signal. The spectrum represents the original signal in the frequency domain and denotes the power of each frequency band present in it. Thus, it is possible to attenuate the effect of a given frequency band in the spectra and, through the Inverse Fourier Transform, to return to the time domain without that band 46 . Nevertheless, DFT has limitations when it is applied to non-stationary (e.g. EEG) signals 47 . Stationary signals are those whose statistical measures, such as mean, variance and covariance remain constant over time in any sample of the input data 48 . Many of the DFT limitations were overcome by the use of wavelets, which consist of mathematical functions that are able to decompose a signal into various time-frequency scales by convolution operations 49 . The employment of wavelets in a given signal decomposition generates coefficients that keep the signal information; therefore, these coefficients can be used to reconstruct the signal through the inverse operation 50 .
Feature extraction. We present six of the more well-known techniques used in the literature for the distinction of AD from healthy elderly patients through EEG signals. Each technique was evaluated in terms of its capacity to discriminate between the groups of subjects. For this purpose, the area under the ROC curve www.nature.com/scientificreports/ (AUC) and the p value from the ANOVA test were calculated. AUC is a largely used measure that represents the effectiveness of a given diagnostic marker. The estimation of the AUC value is done through the probabilities of reaching true positive and true negative rates in a two-group classification. In this sense, AUC assumes values from 0.5 (no apparent distinction) to 1.0 (perfect distinction) between the two groups 53 . On the other hand, ANOVA is a powerful statistical test that is posed by the null hypothesis that two or more samples come from the same population. In this sense, the p value from ANOVA measures the probability of not rejecting the null hypothesis in such a manner that the probability of there existing more than one population increases as the p value comes closer to 0 54 . Moreover, from the point of view of computational cost, the execution time as a function of the signal input size was calculated for each technique.
Wavelet coherence ( C ) Wavelets are special functions that satisfy certain mathematical requirements and are used to represent data or other functions by decomposing them into a series of coefficients. Wavelet algorithms process data at different scales or resolutions 55 . The Wavelet Transform (WT) is used to calculate the coefficients C(a, b) in the scale a and time b as follows: where f(t) is a time series or a function and is the wavelet function.
The coherence C between two time series X and Y can be calculated using the coefficients generated by a WT. This measure represents the agreement between X and Y in different frequency levels through the time domain 56 . It can be calculated as follows: Fractal dimension ( F ) In recent years, the study of nonlinear systems has made possible the creation of metrics capable of quantifying random properties in time series 57 . One of these metrics is the Fractal Dimension (FD), which measures the complexity and the similarity of a given signal with itself 58 . A fractal can be defined as a geometric structure in which its parts repeat the spatial patterns of its whole body. In this matter, the FD of a signal represents its degree of randomness, whereas totally random series present no pattern repetition and, as a consequence, the signal becomes more complex. Different algorithms were proposed to estimate the fractality of a signal 59,60 . In this paper, we applied the one proposed by Katz 60 . First, the total length D of a signal X with T points, defined as the sum of the distances between any two adjacent points, is computed as follows: After, the greatest distance between the first point of X and all the other successive points, defined by d, is computed as follows: Finally, the fractal dimension F defined by Katz can be obtained by: Quadratic entropy ( Q ) The physical concept of entropy represents the degree of freedom in a dynamic system. In other words, entropy measures the degree of disorder of that system, following the idea that totally random systems are completely disordered 61 . For a given signal, the entropy quantifies its regularity, so that signals with high freedom degrees also have high entropy and become less regular 62 . The entropy of a signal X is calculated by finding matches between its T − m + 1 partitions that have m adjacent points in each one. The difference between the scalar components of the partitions X i and X j , with i, j = 1, 2, . . . , N − m + 1 , is defined as the distance d between them 63,64 : Then, B m (r) is calculated as the probability that two partitions match for m points with tolerance r 65 : where r denotes the tolerance level and B i denotes the sum of the quantities of j that satisfy 1 ≤ j ≤ T − m with j = i for any X m (i) . In other words, j is the number of times d X m (i), X m (j) ≤ r occurs. The same steps are repeated with the increment to m + 1 , originating the combinations A i and A m (r): When calculated for short time series, the entropy is limited 66 . Furthermore, studies have shown that the variation of the parameter r is increased when the entropy is calculated in terms of the Quadratic Entropy (QE). This measure can be obtained by: Wavelet energy ( E ) The energy of a frequency band represents the importance of that component to composing the time series 67 . The coefficients generated through Eq. 1 can be used to calculate the energy of their components 56 . The use of wavelets in decomposing a time series requires the number of levels J, which is chosen based on the frequency range to be analyzed 68 . When a time series is decomposed, it generates J + 1 sets of coefficients, which correspond to J detail coefficients sets and 1 approximation coefficients set 69 . The relative Wavelet Energy (WE) E j of band j is defined as follows: where m j is the number of coefficients associated to the band j. Quantiles graphs ( ) A time series analysis method, which converts a time series into a Quantile Graph (QG), has emerged recently from the concepts of complex networks 39,70 . A complex network g = {N, L} is defined as a group of N nodes connected by L edges. In this method, a time series is coarse-grained into Q quantiles q 1 , . . . , q Q . Each quantile q i represents a network node n i , and each weight a k ij in the weighted directed adjacency matrix, denoted as A k , is equal to the number of times a value in quantile q i at time t is followed by a point in quantile q j at time t + k . This method can create directed weighted networks with N QG = Q vertices.
Based on the adjacency matrix A k and the Markov transition matrix W k , mathematical metrics can be used in order to quantify different features of the corresponding network's topology. In previous studies, the mean jump length, k , was successfully used to characterize quantile graphs 23,30,37,39,71 . It is defined as follows: where W T k is the transpose of W k , P is a Q × Q matrix with elements p i,j = |i − j| , and tr is the trace operation. Visibility graphs ( I ) Another time series analysis method based on the complex network theory that converts a time series into a Visibility Graph (VG) 72 has recently emerged from complex networks concepts. In this method, each point of a time series X is represented by a node in the corresponding network. Two nodes, n i and n j , are connected if their corresponding points (i, x(i)) and (j, x(j)) in the time series are "visible" to each other. In other words, if any existing point (k, x(k)) between them satisfies the relationship: The VG method can produce undirected unweighted networks with N VG = T vertices each. In previous studies, the complexity index I was successfully used to characterize visibility graphs 52,73 . It is defined as follows: with: where max is the largest eigenvalue of the corresponding adjacency matrix.

Results
We applied the previously described methods to the problem of discriminating normal controls from patients with AD, based on the 19 EEG channels available. For all channels, we calculated C , F , Q , E , , and I for the groups A, B, C, and D.
C was calculated for all the possible combinations of a given electrode with the others. F and Q were calculated using T = 1, 024 . Q was calculated using r = 0.05, 0.10, 0.15, . . . , 1.00 with m = 1 and m = 2 63 . was calculated using N QG = Q = 2(1, 024) 1/3 ≈ 20 and k = 1, 2, 3, . . . , 25 37 . I was calculated using N VG = T = 1, 024 . For all the methods, the parameters were chosen in such a way to obtain the lowest p value of the ANOVA test.
The Daubechies wavelet filter was used to decompose all the EEG signals in the well-known frequency-bands of the Alzheimer's neural rhythmic activity 74-79 , i.e., delta (0.5-4 Hz), theta (4-8 Hz), alpha (8-15 Hz), and beta (15-30 Hz). The comparisons were made between the groups A vs C and B vs. D to avoid mixing of eye condition.
(10) Q = E + ln(2r).  For the original, non-filtered signals (Fig. 1), and C produce the best results in distinguishing groups A and C. The worst results were produced by the WE method, with E = 1 . As for the filtered signals, C shows the best discrimination power, when the beta, alpha, and theta frequency bands are considered (Figs. 2, 3, 4), specially in the temporal and occipital lobes. Also, regardless of the feature extraction method, the use of delta waves produces the best differentiation results (Fig. 5). In this case, followed by C produce the best results, independently of the electrode placement. This result is typically found in AD patients at a later stage of the disease, and corroborates preliminary findings 37 .  www.nature.com/scientificreports/ The same analysis was also performed considering the groups B and D (eyes closed). Figures 6, 7, 8, 9 and 10 depict the location of the scalp electrodes for the 19 EEG channels, which are represented by circles and colored according to the p value for the C , F , Q , E , , and I measures, respectively. For a given measure, the average p value over all the electrodes is also displayed in each figure. Darker-colored circles indicate a better distinction between aging and AD. For the original, non-filtered signals (Fig. 6), followed by C produce the best results for distinguishing groups B and D. Again, since E is equal to 1.0, this technique is insensitive to the the original EEG signal. As for the frequency bands, C followed by produce the best results, for all bands, specially delta (Figs. 7,8,9,10). Overall, the best differentiation between healthy and probable AD patients is obtained using delta waves, for all feature extraction methods, regardless whether the subject's eyes are open or closed. Figure 11 presents the boxplots for C , F , Q , E , , and I , taking into account the electrode placement, the frequency band and the eye condition that best distinguishes healthy from AD patients in each case. Note that Q and show excellent performance in discriminating patients with different health conditions, with an AUC = 1.0 in both cases. According to some studies 52,56 , the collapse of functional connectivity caused by the loss of neuronal synapses slows the brain's oscillatory activity, and, thus, the neural activity tends to be less complex. For  www.nature.com/scientificreports/ C , E , and I this behavior translates into lower values for unhealthy patients; see, respectively, electrodes T3-O2, F7, and F3, in Fig. 11a, d, and f. The opposite trend (unhealthy values higher than healthy ones) is found in F , Q , and ; see, respectively, electrodes F1, T6, and F3, in Fig. 11b, c, and e. Note that our results for F and Q do not agree with 63,80 . In particular, ref. 80 used the same database of the current study, but a much smaller sample. Table 1 summarizes the best performance of each technique for the measures C , F , Q , E , , and I , based on the corresponding p values and AUC's. In each case, the electrode displacement, the frequency band, and the eyes condition were chosen in such a way to best discriminate patients under different health conditions. Although all measures were able to differentiate healthy from AD patients, displays the best results, regardless the eyes condition (closed or open). Based on the values of the measures C , F , Q , E , , and I , a support vector machine method was used to individually differentiate healthy elderly subjects from patients with AD. The accuracy (Acc), the sensitivity (Sen), and the specificity (Spe) were calculated (Table 2) using the k-fold crossvalidation technique for K = 10 and the EEG signals under the same conditions established for the previous analysis (Table 1). Although all measures were able to properly classify patients in different health conditions,  In each case, the initial and the final time series lengths were T = 500 and T = 10, 000 points, respectively. In each step, the time series length had an increment of 100 time points, and the computational cost was normalized by the one spent in the initial time step (Fig. 12). Overall, F (FD) followed by E (WE) and (QG) required the lowest computational effort, while the time spent by the I (VG) method was the highest. This is due to the increasing size of the matrix required to be computed by this method, which reaches 10, 000 × 10, 000 elements in the final step of the simulations.  www.nature.com/scientificreports/

Discussion
In this paper, the automatic detection of Alzheimer's disease was performed based on six methods commonly used in the literature. More specifically, for the measures C , F , Q , E , , and I , according to the corresponding p values and AUC's. Although most of the measures could distinguish between healthy and AD patients (with the exception of E ), , followed by C display the best results for the original signals, regardless the electrode displacement or the eye condition. In terms of the frequency-bands effect, regardless the method employed, delta waves provide the best differentiation. This finding confirms the prior knowledge that all patients under study may have the disease in its late stage. The values of Acc (100%), Sen (100%) and Spe (100%), obtained by the measure in combination with the k-fold cross-validation technique, demonstrate that this measure is the most efficient for the classification of individual patients, regardless of their eye condition. Many research groups have attempted to use computational EEG signal analysis, and many different methods have shown promising results in detecting brain diseases [25][26][27][28][29] . Methods derived from information theory, time-frequency decomposition, and graph theory are examples of computational tools used in distinguishing  www.nature.com/scientificreports/ healthy from unhealthy subjects [30][31][32][33][34][35][36] . Several studies in the literature have shown that AD causes slowing in EEG rhythms, reduction in EEG complexity, and changes in synchrony among brain regions 22 . In this sense, the aim of this study was to apply different methods of detecting AD through EEG and investigate the properties of the signals that distinguish the groups of patients. In particular, the capacity for distinguishing AD from healthy elderly subjects was evaluated in terms of the area under the ROC curve and the ANOVA test. In a previous study, it was reported that a high accuracy in detecting AD through EEG was achieved using a small compilation of signals of 24 AD patients and 24 healthy controls 37 . The database in this paper was extended to 160 AD patients, and different computational methods were applied in order to compare their performances in terms of distinguishing groups under different health conditions and in terms of computational cost. Figure 11. Boxplots for the best electrodes and frequency bands for C , F , Q , E , , and I , respectively.  www.nature.com/scientificreports/ The main goal of the current investigation was to explore the best algorithms capable of differentiating well established AD from normal subjects. Furthermore, our special interest was to evaluate the performance of a new approach known as quantile graphs. It is worth mentioning that according to Rossini 6 , early Alzheimer disease detection can be performed with the used of quantitative EEG with an accuracy of up to 98% percent. In the future, the methodology proposed by Rossini may be used in association with the one described here in order to improve the early diagnosis of this disease through EEG signals.
Limitations of the study. Most of the computational methods presented here were able to distinguish between healthy individuals and AD patients. However, it is worth mentioning that the subjects under study were not submitted to a definitive pathological diagnosis of AD as well as health controls. As a result, some clinical features of the disease are missing from the database, making it difficult to estimate the efficacy of the methods in providing an early diagnosis for AD patients with only mild cognitive impairment.

Conclusion
Although early detection is critical to improving the quality of life of AD patients, most of the presently available diagnostic tools, from volumetric magnetic resonance imaging (MRI) to lumbar puncture, are invasive, expensive, and poorly available on community health facilities 6 . In this paper, we applied six non-linear time-series analysis methods to EEG records from 160 AD patients and 24 healthy controls. Our goal was to evaluate the sensibility and robustness of each of the six methods on the task of discriminating AD patients from healthy subjects.
With the exception of the wavelet energy ( E ) method, all the other five computational measures were able to distinguish between healthy and AD patients. More specifically, quantile graphs ( ) followed by wavelet coherence ( C ) generated the best results using the original, non-filtered signals, regardless the electrode placement or the eye condition (open or closed). As for the wavelet-filtered signals, the use of delta wave signals improved the discriminating power of all methods, which indicates that the AD patients under study may have the disease in its late stage. Finally, taking into account the electrode placement, the frequency band and the eye condition, and considering only the best results obtained by each method, showed the best performance in discriminating patients with different health conditions, with Acc = 100% , Sen = 100% and Spe = 100% (see Table 2). Finally, taking into account the discrimination performance and the computational cost altogether, followed by C are the most recommended methods for the AD diagnosis problem.
As for topics for future research, it is necessary to re-evaluate the performance of the methods used in this study when they are applied to a data set that includes patients at different stages of AD, including MCI that later evolves to AD. In addition, the use of low resolution electromagnetic tomography (LORETA), in order to detect the onset of MCI and AD 81 , should also be considered.